Closed
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements support for recursive CTEs based on @isidentical's design here #462 (comment)
Caveats:
It does not attempt to place safeguards against infinite recursion. That could be implemented as a follow-up PR. It would be great to have some syntax like
OPTON (MAXRECURSION 100)to prevent infinite recursion. Since the sql parser crate doesn’t support OPTIONS (MAXRECURSION) I might create a max-recursion parameter to the logical and physical plans that can be inserted manually once the logical plan is produced via SQL as a near-term workaround to protect against infinite recursion. I'll do that in a separate follow-on PR though.It's slower than it needs to be because of the execution plan inserts coalesce and repartition execution steps within the recursive term. This optimization is solved via this follow-on PR. There is also room to omit gathering of statistics for each execution iteration because setting up statistics on each execution iteration is expensive.
Which issue does this PR close?
Closes #462
Rationale for this change
Datafusion should support recursive CTEs so that it can express calculations that depend on the results of previous iterations.
What changes are included in this PR?
Are these changes tested?
Yes, they're tested via the SQL tests. The SQL tests use CSVs introduced by this PR apache/arrow-testing#93
Are there any user-facing changes?